{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Baseball Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook contains a detailed example, demonstrating the typical workflow Graft aims to support. \n",
    "The dataset used here was constructed by splicing together two separate datasets:\n",
    "\n",
    "1. `SOCR Data MLB HeightsWeights`: Heights, ages and weights of Baseball players (Vertex Data). References:\n",
    "  * Jarron M. Saint Onge, Patrick M. Krueger, Richard G. Rogers. (2008) Historical trends in height,\n",
    "  weight, and body mass: Data from U.S. Major League Baseball players, 1869-1983, Economics & Human\n",
    "  Biology, Volume 6, Issue 3, Symposium on the Economics of Obesity, December 2008, Pages 482-488,\n",
    "  ISSN 1570-677X, DOI: 10.1016/j.ehb.2008.06.008.\n",
    "  * Jarron M. Saint Onge, Richard G. Rogers, Patrick M. Krueger. (2008) Major League Baseball Players'\n",
    "  Life Expectancies, Southwestern Social Science Association, Volume 89, Issue 3, pages 817–830,\n",
    "  DOI: 10.1111/j.1540-6237.2008.00562.x.\n",
    "2. `Advogato Trust Network` : Edge weights between 0 and 1. References:\n",
    "  * Advogato network dataset -- KONECT, July 2016. [http](http://konect.uni-koblenz.de/networks/advogato)\n",
    "  * Paolo Massa, Martino Salvetti, and Danilo Tomasoni. Bowling alone and trust decline in social network\n",
    "  sites. In Proc. Int. Conf. Dependable, Autonomic and Secure Computing, pages 658--663, 2009.\n",
    "\n",
    "The dataset has 6541 vertices, 51127 edges.\n",
    "Vertex properties: Age, Height(cm), Weight(kg)\n",
    "Edge properties  : Trust(float)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO: Recompiling stale cache file /Users/pranav/.julia/lib/v0.5/Graft.ji for module Graft.\n",
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100 1900k  100 1900k    0     0   163k      0  0:00:11  0:00:11 --:--:--  288k\n"
     ]
    }
   ],
   "source": [
    "## Load and summarize the graph.\n",
    "using Graft\n",
    "using StatsBase\n",
    "import LightGraphs\n",
    "\n",
    "# Load the graph\n",
    "download(\n",
    "\"https://raw.githubusercontent.com/pranavtbhat/Graft.jl/gh-pages/Datasets/baseball.txt\",\n",
    "joinpath(Pkg.dir(\"Graft\"), \"examples/baseball.txt\")\n",
    ");\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Graph(6541 vertices, 51127 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = loadgraph(joinpath(Pkg.dir(\"Graft\"), \"examples/baseball.txt\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6541,51127)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get the graph's size\n",
    "size(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Get an iterator over the graph's edges\n",
    "edges(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# List vertex labels\n",
    "encode(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Split the graph into vertex and edge descriptors\n",
    "V,E = g;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "│ VertexID │ Labels       │ Age   │ Height │ Weight  │\n",
       "├──────────┼──────────────┼───────┼────────┼─────────┤\n",
       "│ 1        │ \"gc\"         │ 26.03 │ 182.88 │ 104.545 │\n",
       "│ 2        │ \"prigaux\"    │ 25.43 │ 175.26 │ 84.0909 │\n",
       "│ 3        │ \"fred\"       │ 24.51 │ 182.88 │ 93.6364 │\n",
       "│ 4        │ \"quintela\"   │ 31.81 │ 193.04 │ 86.3636 │\n",
       "│ 5        │ \"jgarzik\"    │ 27.32 │ 185.42 │ 90.9091 │\n",
       "│ 6        │ \"penso\"      │ 25.5  │ 185.42 │ 86.3636 │\n",
       "│ 7        │ \"leviramsey\" │ 32.68 │ 182.88 │ 90.9091 │\n",
       "│ 8        │ \"havardk\"    │ 30.22 │ 182.88 │ 88.6364 │\n",
       "│ 9        │ \"sh\"         │ 28.8  │ 182.88 │ 90.9091 │\n",
       "│ 10       │ \"zappy\"      │ 29.54 │ 193.04 │ 104.545 │\n",
       "│ 11       │ \"ollesson\"   │ 29.12 │ 180.34 │ 106.818 │\n",
       "⋮\n",
       "│ 6530     │ \"boerner\"    │ 30.46 │ 187.96 │ 88.1818 │\n",
       "│ 6531     │ \"barismetin\" │ 32.68 │ 193.04 │ 89.5455 │\n",
       "│ 6532     │ \"baris\"      │ 28.11 │ 180.34 │ 88.6364 │\n",
       "│ 6533     │ \"obritim\"    │ 26.63 │ 187.96 │ 84.0909 │\n",
       "│ 6534     │ \"arabouma36\" │ 24.15 │ 193.04 │ 81.8182 │\n",
       "│ 6535     │ \"MachX\"      │ 26.49 │ 190.5  │ 90.9091 │\n",
       "│ 6536     │ \"seahawk\"    │ 22.89 │ 190.5  │ 90.9091 │\n",
       "│ 6537     │ \"xchaix\"     │ 28.53 │ 177.8  │ 95.4545 │\n",
       "│ 6538     │ \"sktrdie\"    │ 25.35 │ 180.34 │ 86.3636 │\n",
       "│ 6539     │ \"KellyHo\"    │ 22.34 │ 185.42 │ 95.4545 │\n",
       "│ 6540     │ \"mike1086\"   │ 23.45 │ 190.5  │ 86.3636 │\n",
       "│ 6541     │ \"thelema\"    │ 29.99 │ 193.04 │ 88.6364 │"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display the vertex table\n",
    "V"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "│ Index │ Source        │ Target        │ Trust     │\n",
       "├───────┼───────────────┼───────────────┼───────────┤\n",
       "│ 1     │ \"gc\"          │ \"gc\"          │ 0.42739   │\n",
       "│ 2     │ \"gc\"          │ \"prigaux\"     │ 0.978998  │\n",
       "│ 3     │ \"gc\"          │ \"fred\"        │ 0.714178  │\n",
       "│ 4     │ \"gc\"          │ \"penso\"       │ 0.999861  │\n",
       "│ 5     │ \"gc\"          │ \"leviramsey\"  │ 0.993962  │\n",
       "│ 6     │ \"gc\"          │ \"sh\"          │ 0.336044  │\n",
       "│ 7     │ \"gc\"          │ \"fxn\"         │ 0.0949308 │\n",
       "│ 8     │ \"gc\"          │ \"chromatic\"   │ 0.778156  │\n",
       "│ 9     │ \"gc\"          │ \"strider\"     │ 0.874019  │\n",
       "│ 10    │ \"gc\"          │ \"sdodji\"      │ 0.282097  │\n",
       "│ 11    │ \"gc\"          │ \"Nyco\"        │ 0.142455  │\n",
       "⋮\n",
       "│ 51116 │ \"hulver\"      │ \"hulver\"      │ 0.410183  │\n",
       "│ 51117 │ \"asanders\"    │ \"asanders\"    │ 0.754016  │\n",
       "│ 51118 │ \"Aracnus\"     │ \"Aracnus\"     │ 0.868275  │\n",
       "│ 51119 │ \"billstewart\" │ \"billstewart\" │ 0.976183  │\n",
       "│ 51120 │ \"boerner\"     │ \"slok\"        │ 0.864808  │\n",
       "│ 51121 │ \"baris\"       │ \"barismetin\"  │ 0.934414  │\n",
       "│ 51122 │ \"MachX\"       │ \"arabouma36\"  │ 0.059897  │\n",
       "│ 51123 │ \"seahawk\"     │ \"seahawk\"     │ 0.935393  │\n",
       "│ 51124 │ \"xchaix\"      │ \"xchaix\"      │ 0.966611  │\n",
       "│ 51125 │ \"sktrdie\"     │ \"sktrdie\"     │ 0.323029  │\n",
       "│ 51126 │ \"KellyHo\"     │ \"KellyHo\"     │ 0.404737  │\n",
       "│ 51127 │ \"mike1086\"    │ \"mike1086\"    │ 0.370529  │"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display the edge table\n",
    "E"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "26.23778373854929"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find the average BMI of baseball players\n",
    "@query(g |> eachvertex(v.Weight / (v.Height / 100) ^ 2)) |> mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6.166666864000001"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find the median height of baseball players in their 20s\n",
    "@query(g |> filter(v.Age < 30,v.Age >= 20) |> eachvertex(v.Height * 0.0328084)) |> median"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4.163929464037767"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find the mean age difference in strong relationships\n",
    "@query(g |> filter(e.Trust > 0.8) |> eachedge(s.Age - t.Age)) |> abs |> mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Graph(1957 vertices, 29901 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find fred's 3 hop neighborhood (friends and friends-of-friends and so on)\n",
    "fred_nhood = hopgraph(g, \"fred\", 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5495668265206273"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# See how well younger players in fred's neighborhood trust each other\n",
    "@query(fred_nhood |> filter(v.Age > 30) |> eachedge(e.Trust)) |> mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Graph(1615 vertices, 23569 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust] edge properties)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find the 2 hop neighborhood of 2 separate vertices (multi seed traversal)\n",
    "sg = hopgraph(g, [\"nikolay\", \"jbert\"], 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Generate an edge distance property on the inverse of normalized-trust\n",
    "dists = @query(sg |> eachedge(1 / e.Trust ));\n",
    "seteprop!(sg, :, dists, :Dist);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Graph(1615 vertices, 22108 edges, Symbol[:Age,:Height,:Weight] vertex properties, Symbol[:Trust,:Dist] edge properties)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Trim edges of very high distance\n",
    "sg = @query(sg |> filter(e.Dist < 10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{1615, 22108} directed graph"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Export the graph's adjacency matrix\n",
    "M = export_adjacency(sg)\n",
    "lg = LightGraphs.DiGraph(M)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Export the edge distance property\n",
    "D = export_edge_property(sg, :Dist);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Compute betweenness centrailty\n",
    "centrality = LightGraphs.betweenness_centrality(lg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Set the centrality as a vertex property\n",
    "setvprop!(sg, :, centrality, :Centrality);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Apply all pair shortest paths on the graph\n",
    "apsp = LightGraphs.floyd_warshall_shortest_paths(lg, D).dists;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Add the new shortest paths as a property to the graph\n",
    "eit = edges(sg);\n",
    "seteprop!(sg, :, [apsp[e.second,e.first] for e in eit], :Shortest_Dists);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "│ VertexID │ Labels        │ Age   │ Height │ Weight  │ Centrality  │\n",
       "├──────────┼───────────────┼───────┼────────┼─────────┼─────────────┤\n",
       "│ 1        │ \"lkcl\"        │ 30.51 │ 190.5  │ 95.4545 │ 0.0352864   │\n",
       "│ 2        │ \"chalst\"      │ 27.16 │ 187.96 │ 79.5455 │ 0.0180542   │\n",
       "│ 3        │ \"jrf\"         │ 27.23 │ 182.88 │ 81.8182 │ 0.0145245   │\n",
       "│ 4        │ \"Astinus\"     │ 33.77 │ 190.5  │ 81.8182 │ 1.73845e-5  │\n",
       "│ 5        │ \"halcy0n\"     │ 30.8  │ 187.96 │ 90.9091 │ 0.00615578  │\n",
       "│ 6        │ \"mbp\"         │ 24.21 │ 182.88 │ 113.182 │ 0.0232976   │\n",
       "│ 7        │ \"sulaiman\"    │ 33.15 │ 198.12 │ 100.0   │ 0.00730561  │\n",
       "│ 8        │ \"crackmonkey\" │ 27.08 │ 185.42 │ 109.091 │ 0.00691493  │\n",
       "│ 9        │ \"ajv\"         │ 32.84 │ 180.34 │ 90.9091 │ 0.00599549  │\n",
       "│ 10       │ \"lukeh\"       │ 30.99 │ 185.42 │ 81.8182 │ 0.0074532   │\n",
       "│ 11       │ \"AndreyGolub\" │ 29.84 │ 193.04 │ 86.3636 │ 0.0         │\n",
       "⋮\n",
       "│ 1604     │ \"jwoolley\"    │ 40.66 │ 180.34 │ 77.2727 │ 1.26063e-5  │\n",
       "│ 1605     │ \"gozer\"       │ 26.75 │ 185.42 │ 104.545 │ 0.0         │\n",
       "│ 1606     │ \"rederpj\"     │ 24.76 │ 185.42 │ 103.182 │ 0.0         │\n",
       "│ 1607     │ \"elsharkco\"   │ 24.69 │ 182.88 │ 95.9091 │ 0.0         │\n",
       "│ 1608     │ \"netgod\"      │ 26.59 │ 185.42 │ 95.4545 │ 0.000582392 │\n",
       "│ 1609     │ \"hadess\"      │ 28.48 │ 175.26 │ 95.4545 │ 0.00283106  │\n",
       "│ 1610     │ \"largo\"       │ 33.57 │ 185.42 │ 88.6364 │ 0.000318805 │\n",
       "│ 1611     │ \"kazen\"       │ 22.52 │ 175.26 │ 84.0909 │ 2.99768e-5  │\n",
       "│ 1612     │ \"bluets\"      │ 31.63 │ 180.34 │ 102.273 │ 0.0         │\n",
       "│ 1613     │ \"secabeen\"    │ 28.56 │ 193.04 │ 90.9091 │ 0.0         │\n",
       "│ 1614     │ \"nikolay\"     │ 23.29 │ 180.34 │ 100.0   │ 0.0         │\n",
       "│ 1615     │ \"jbert\"       │ 31.84 │ 193.04 │ 85.9091 │ 0.0         │"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    " # Show new vertex descriptor\n",
    "VertexDescriptor(sg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "│ Index │ Source     │ Target        │ Trust    │ Dist    │ Shortest_Dists │\n",
       "├───────┼────────────┼───────────────┼──────────┼─────────┼────────────────┤\n",
       "│ 1     │ \"lkcl\"     │ \"chalst\"      │ 0.753731 │ 1.32673 │ 1.32673        │\n",
       "│ 2     │ \"lkcl\"     │ \"jrf\"         │ 0.837243 │ 1.1944  │ 1.1944         │\n",
       "│ 3     │ \"lkcl\"     │ \"Astinus\"     │ 0.620516 │ 1.61156 │ 1.61156        │\n",
       "│ 4     │ \"lkcl\"     │ \"halcy0n\"     │ 0.704766 │ 1.41891 │ 1.41891        │\n",
       "│ 5     │ \"lkcl\"     │ \"mbp\"         │ 0.879317 │ 1.13725 │ 1.13725        │\n",
       "│ 6     │ \"lkcl\"     │ \"sulaiman\"    │ 0.352907 │ 2.83361 │ 2.33345        │\n",
       "│ 7     │ \"lkcl\"     │ \"crackmonkey\" │ 0.223243 │ 4.47942 │ 3.25504        │\n",
       "│ 8     │ \"lkcl\"     │ \"ajv\"         │ 0.427735 │ 2.3379  │ 2.3379         │\n",
       "│ 9     │ \"lkcl\"     │ \"AndreyGolub\" │ 0.896434 │ 1.11553 │ 1.11553        │\n",
       "│ 10    │ \"lkcl\"     │ \"fxn\"         │ 0.187012 │ 5.34724 │ 2.21906        │\n",
       "│ 11    │ \"lkcl\"     │ \"splork\"      │ 0.103399 │ 9.67129 │ 2.17231        │\n",
       "⋮\n",
       "│ 22097 │ \"largo\"    │ \"hadess\"      │ 0.953549 │ 1.04871 │ 1.04871        │\n",
       "│ 22098 │ \"largo\"    │ \"largo\"       │ 0.849266 │ 1.17749 │ 0.0            │\n",
       "│ 22099 │ \"kazen\"    │ \"teknix\"      │ 0.673166 │ 1.48552 │ 1.48552        │\n",
       "│ 22100 │ \"kazen\"    │ \"kroah\"       │ 0.658092 │ 1.51954 │ 1.51954        │\n",
       "│ 22101 │ \"kazen\"    │ \"kazen\"       │ 0.672931 │ 1.48604 │ 0.0            │\n",
       "│ 22102 │ \"bluets\"   │ \"teknix\"      │ 0.179145 │ 5.58207 │ 5.58207        │\n",
       "│ 22103 │ \"bluets\"   │ \"Stevey\"      │ 0.307009 │ 3.25723 │ 3.25723        │\n",
       "│ 22104 │ \"bluets\"   │ \"bluets\"      │ 0.473882 │ 2.11023 │ 0.0            │\n",
       "│ 22105 │ \"secabeen\" │ \"teknix\"      │ 0.607547 │ 1.64596 │ 1.64596        │\n",
       "│ 22106 │ \"nikolay\"  │ \"lkcl\"        │ 0.335673 │ 2.97909 │ 2.97909        │\n",
       "│ 22107 │ \"nikolay\"  │ \"chalst\"      │ 0.600938 │ 1.66407 │ 1.66407        │\n",
       "│ 22108 │ \"jbert\"    │ \"jrf\"         │ 0.577956 │ 1.73024 │ 1.73024        │"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Show the new edge descriptor\n",
    "EdgeDescriptor(sg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 0.5.0-rc2",
   "language": "julia",
   "name": "julia-0.5"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "0.5.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
